home *** CD-ROM | disk | FTP | other *** search
- Path: anvil.ugrad.cs.ubc.ca!not-for-mail
- From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
- Newsgroups: comp.lang.c
- Subject: Re: C uses simple LALR parsing?
- Date: 14 Mar 1996 17:18:13 -0800
- Organization: Computer Science, University of B.C., Vancouver, B.C., Canada
- Message-ID: <4iagglINNdu7@anvil.ugrad.cs.ubc.ca>
- References: <4i9ilq$92a@sargas.omicron.se>
- NNTP-Posting-Host: anvil.ugrad.cs.ubc.ca
-
- In article <4i9ilq$92a@sargas.omicron.se>,
- Elias Martenson <elias@omicron.se> wrote:
- >This question has been subject to major debate among me and my firends:
- >
- >Does the C standard specify what kind of parser to use? If it uses
- >(and is does seem that at least AT&T C and GCC does) a simple
- >LALR parser it would mean that the following expression:
- >
- > a+++++b
- >
- >Yields:
- >
- > SYMBOL INCR INCR ADD SYMBOL
- >
- >Instead of the desired result (which is acieved when doing a++ + ++b):
- >
- > SYMBOL INCR ADD INCR SYNBOL
- >
- >Is it legal for a C compiler to parse the expression like the last
- >example?
-
- The distinction you are describing has nothing to do with LALR(1) parsing, and
- everything with _lexical analysis_.
-
- Lexical analysis is based on regular expressions, which are weaker than
- LR grammars (a regular expression is basically a grammar which is only right
- recursive---a non terminal symbol never appears between two terminals in any
- production), hence it cannot match things like nested parentheses.
-
- The answer to your question is, yes, the C standard lexical rules are indeed
- unambigous. The expression 'a+++++b' will be tokenized as 'a ++ ++ + b'.
-
- The rule used is simple: extract from the input the longest lexeme that matches
- the regular expression pattern. When faced with '+++++++', the lexical analyzer
- will extract '++' rather than '+'. Both are matching tokens, but the first one
- is longer.
-
- The compiler will not figure out the alternate scan just to eliminate the
- syntax error.
- --
-
-